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1  Executive  Summary 


The  project  is  aimed  at  developing  a  comprehensive  framework  for  control  of  information  collec¬ 
tion,  fusion,  and  inference  from  diverse  modalities  Our  research  has  been  organized  under  three 
inter-related  thrusts.  The  first  thrust  addresses  system  modeling  and  local  information  process¬ 
ing.  The  second  thrust  emphasizes  the  interaction  between  information  and  control  at  different 
abstraction  levels.  The  third  thrust  is  focused  on  decentralized  processing  and  interactive  fusion. 

Within  Thrust  1,  we  focused  on  exploring,  developing,  and  utilizing  mathematical  models  for 
hard  and  soft  observations,  the  physical  and  abstract  information  states,  and  the  sensing  state  for 
local  information  processing  and  inference: 

•  Sensing-aware  inference  with  high-dimensional  signals.  [1-9]  Here,  motivated  by  applica¬ 
tions  involving  the  control  of  information  collection,  we  analyzed  the  fundamental  limits  of 
supervised  inference  in  problems  where  the  observations  of  the  state  are  high-dimensional, 
indirect,  and  noisy  but  the  sensing  process  has  an  underlying  low-dimensional  structure 
which  is  partially  known. 

•  Discovering  latent  patterns  in  high-dimensional  data.  [10-19]  Here  we  studied  the  problem 
of  modeling  and  discovering  salient  latent  topics  or  patterns  in  soft  and  hard  observations 
with  provable  performance  guarantees.  Applications  include  higher-level  inference  tasks 
such  as  inferring  “intent”  and  other  abstract  patterns  of  behavior  from  soft  data,  e.g.,  twitter 
feeds,  text  and  email  messages,  text-based  event  transcripts,  expert  assessments,  etc.  Other 
applications  include  mid-level  inference  tasks,  e.g.,  estimating  regions  of  observed  scenery 
that  are  “most  interesting”,  i.e.,  salient,  relative  to  a  context  of  “common-place”  imagery 
in  automated  visual  reconnaissance  missions  using  UAVs.  Video  saliency  estimates  can  be 
used  to  select  (control)  subsequent  sensing  states  to  maximize  information  collection. 

•  Action  recognition  on  the  feature-covariance  manifold.  [20-22 J  Here  we  developed  and  an¬ 
alyzed  sparse  linear  and  nonlinear  manifold  representations  of  video  signals  for  detecting 
and  recognizing  local  activity  using  low-dimensional  empirical  feature  covariance  matrices. 

Within  Thrust  2,  we  have  obtained  the  following  results  on  the  interaction  between  information 
and  control  in  various  inference  contexts. 

•  Sensor  Scheduling  for  Energy-Efficient  Target  Tracking  in  Sensor  Networks.  [23-26]  We 
studied  the  problem  of  tracking  an  object  moving  randomly  through  a  network  of  wireless 
sensors.  Our  objective  was  to  devise  strategies  for  scheduling  the  sensors  to  optimize  the 
tradeoff  between  tracking  performance  and  energy  consumption. 

•  Controlled  Sensing  for  Hypothesis  Testing.  [27-31  ]  We  considered  the  problem  of  multiple 
hypothesis  testing  with  observation  control,  and  studied  the  structure  of  the  optimal  con¬ 
troller  under  various  asymptotic  regimes. 

•  Efficient  Target  Tracking  using  Mobile  Sensors.  [32]  We  studied  a  mathematical  model  for 
tracking  of  a  moving  target  by  multiple  mobile  sensors  in  the  partially  observable  Markov 
decision  process  (POMDP)  framework.  We  proposed  computationally  efficient  policies  for 
controlling  the  mobile  sensors,  and  provided  a  guarantee  on  their  performance  relative  to 
that  of  the  optimal  policy 
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•  Controlled  Sensing  for  Sequential  Multihypothesis  Testing  with  Controlled  Markovian  Ob- 
servations  and  Non-Uniform  Control  Cost.  [S3, 34 ]  We  proposed  a  new  model  for  controlled 
sensing  for  multihypothesis  testing  and  studied  it  in  the  sequential  setting.  This  new  model, 
termed  controlled  Markovian  observation  model,  exhibits  a  more  complicated  memory  struc¬ 
ture  in  the  controlled  observations  than  existing  models.  In  addition,  instead  of  penalizing 
just  the  delay  until  the  final  decision  time  as  standard  sequential  hypothesis  testing  prob¬ 
lems,  a  much  more  general  cost  structure  is  considered  which  entails  accumulating  the  total 
control  cost  with  respect  to  an  arbitrary  control  cost  function 

•  Controlled  Sensing  Approach  to  Graph  Classification.  [35,36]  We  posed  the  problem  of  clas¬ 
sifying  graphs  with  respect  to  connectivity  via  partial  observations  of  nodes  as  a  composite 
hypothesis  testing  problem  with  controlled  sensing,  and  proposed  a  solution  that  achieves 
achieves  asymptotically  optimal  error  performance,  as  the  error  rate  goes  to  zero. 

•  Universal  Outlier  Hypothesis  Testing.  [37-41]  Motivated  by  our  previous  research  on  the 
search  problem,  we  studied  the  following  outlier  hypothesis  testing  problem  in  a  universal 
setting.  We  have  obtained  a  number  of  results  on  this  problem. 

•  Universal  Sequential  Outlier  Hypothesis  Testing.  [42-45]  Here  we  extended  our  work  on 
universal  outlier  hypothesis  testing  to  sequential  and  quickest  detection  settings. 

•  Universal  Tests  for  Optimal  Search  and  Stop.  [46,  47]  We  studied  the  problem  of  univer¬ 
sal  search  and  stop  using  an  adaptive  search  policy.  When  the  target  location  is  searched, 
the  observation  is  assumed  to  be  distributed  according  to  the  target  distribution,  otherwise 
it  is  distributed  according  to  the  absence  distribution.  We  assume  that  only  the  absence  dis¬ 
tribution  is  known,  and  the  target  distribution  can  be  arbitrarily  distinct  from  the  absence 
distribution.  We  developed  a  universal  test  for  this  problem  and  established  its  asymptotic 
optimality 

Within  Thrust  3,  Decentralized  Processing  and  Interactive  Fusion,  we  have  obtained  the  follow¬ 
ing  results  in  the  context  of  dynamic  information  collection  and  fusion  for  situational  awareness. 

•  Interactive  Fusion.  We  extended  the  sufficiency  principle  to  decentralized  inference  and 
developed  a  new  framework  for  decentralized  data  reduction.  In  particular,  it  was  shown 
that  with  each  node  subject  to  a  quantization  constraint,  the  traditional  sufficiency  framework 
needs  to  be  augmented  by  novel  notions  of  sufficiency  such  as  conditional  sufficiency.  The 
guiding  principle  is  to  minimize  information  loss  instead  of  preserve  the  entire  information 
which  is  often  impossible  with  quantization. 

•  Data  Reduction  with  Quantization  Constraint.  For  a  mutli-sensor  tandem  system,  it  was 
established  that  interactive  fusion  will  strictly  improve  the  inference  performance  with  de¬ 
pendent  observations.  With  conditional  independence  between  the  sensor  observations,  how¬ 
ever,  there  is  not  asymptotic  performance  improvement  when  the  sample  size  increases. 

•  Network  Consensus  and  Quantized  ADMM.  Network  consensus  problems  are  studied  in  the 
context  of  decentralized  optimization  framework  using  alternate  direction  method  of  mut- 
liplier  (ADMM),  again  with  the  realistic  constraint  of  quantization  at  each  node  within  the 
network.  Convergence  result  was  established  for  the  first  time  for  deterministically  quantiz¬ 
ers  along  with  consensus  error  bounds.  The  approach  has  significant  implications  in  network 
inference  as  many  decentralized  inference  problems  can  be  framed  as  multi-agent  optimiza¬ 
tion  problems,  including  the  consensus  problem. 
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•  Resource  Management  in  Sensor  Networks.  We  have  studied  sensor  and  resource  manage¬ 
ment  under  stringent  resource  constraint  for  decentralized  inference.  These  include  sensor 
selection,  scheduling,  bandwidth  and  power  allocation  from  the  perspectives  of  sparse  learn¬ 
ing,  information  theory,  economic  equilibria  and  compressive  sensing. 

•  Decentralized  Inference  and  Information  Fusion.  Assured  information  fusion  has  been  stud¬ 
ied  in  the  context  of  decentralized  inference  in  the  presence  of  adversaries.  Specifically, 
decentralized  detection  and  estimation  in  the  presence  Byzantine  nodes  have  been  studied 
where  fundamental  performance  limits  as  well  as  robust  decision  rule  design  have  been  in¬ 
vestigated. 
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2  Thrust  1:  Modeling  and  Information  Processing 

2.1  Sensing-aware  inference  with  high-dimensional  signals 

Control  of  information  collection  often  requires  decisions  to  be  made  about  the  state  of  objects 
based  on  few  indirect  noisy  observations  of  high-dimensional  signals,  e.g.,  determining  the  cat¬ 
egory  of  moving  objects  in  SAR  signals  to  determine  if  one  should  continue  exploring  a  certain 
geographical  region  or  change  the  sensing  modality  and  configuration  to  get  a  more  accurate  iden¬ 
tification  of  the  object  category.  This  belongs  to  the  broad  family  of  inference  problems  where  the 
ambient  dimension  of  the  sensed  data  is  very  large  relative  to  the  number  of  samples  but  there  ex¬ 
ists  a  latent  low  dimensional  sensing  structure  that  can  potentially  be  leveraged  for  inference  tasks. 
Conventionally,  the  sensing  process  is  inverted  and  a  decision  rule  is  built  in  the  reconstructed  do¬ 
main,  which  requires  complete  knowledge  of  the  sensing  mechanism.  Alternatively,  a  direct  data 
domain  decision  rule  might  be  constructed,  but  the  constraints  imposed  by  the  sensing  process  are 
then  lost.  In  this  work  we  explored  the  behavior  of  a  third  path  we  term  “sensing-aware  infer¬ 
ence.”  This  project  has  contributed  to  the  development  of  a  rigorous  theory  as  well  as  a  practical 
algorithmic  framework  for  such  challenging  problems. 

Theoretical  results  for  sensing-aware  inference:  We  considered  an  abstracted  binary  supervised 
classification  problem  with  very  high  dimensional  observations,  a  sensing  configuration  involv¬ 
ing  Gaussian  likelihood  functions,  and  limited  knowledge  of  statistical  models  of  noise  and  object 
which  must  be  learned  from  limited  training  data.  We  analyzed  the  impact  of  different  levels  of 
prior  knowledge  concerning  the  latent  sensing  structure  on  supervised  classification  performance 
for  various  classification  strategies  when  the  data  dimension  scales  to  infinity  faster  than  the  num¬ 
ber  of  samples.  In  contrast  to  related  studies,  here  the  classification  difficulty  is  held  fixed  as  the 
data  dimension  scales.  We  established  several  results: 

1.  We  first  proved  that  strategies  that  are  based  on  a  naive  estimation  of  all  model  elements 
results  in  a  classification  performance  which  is  asymptotically  no  better  than  pure  guessing. 
We  also  proved  that  sensing-aware,  projection-based  classification  rules  attain  the  Bayes- 
optimal  risk.  [5-7]. 

2.  An  impossibility  result:  We  proved  that  whenever  the  number  of  signal  dimensions  scales 
faster  than  the  number  of  labeled  samples  at  constant  classification  difficulty,  the  asymptotic 
minimax  classification  error  probability  of  any  supervised  classification  algorithm  cannot 
converge  to  anything  less  than  that  of  random  guessing  [2,3]. 

This  basic  impossibility  result  points  to  the  fundamental  need  for  sparsity  and  generalizes 
and  unifies  various  special  cases.  In  prior  related  studies  of  high-dimensional  LDA,  either 
the  samples  per  dimension  is  held  fixed  (or  goes  to  infinity),  or  the  classification-difficulty 
is  made  to  vanish  as  dimensions  increase  to  infinity,  or  only  certain  specialized  families  of 
learning  rules,  e.g.,  maximum-likelihood  plug-in  rules,  were  considered. 

3.  Necessity  of  “structure”  for  good  performance  in  high-dimensional  inference:  We  showed 
that  unless  there  exists  some  type  of  underlying  sparsity  in  the  latent  low-dimensional  signal 
parametric  structure  (specifically  that  the  parametrization  have  zero  measure  with  respect  to 
the  Haar  measure  on  a  certain  high-dimensional  unit  sphere),  it  is  impossible  for  any  super¬ 
vised  learning  algorithm  to  attain  a  non-trivial  (i.e.,  better  than  random  guessing)  asymptotic 
classification  error  probability  in  the  regime  where  the  number  of  signal  dimensions  scales 
faster  than  the  number  of  samples  while  maintaining  constant  classification  difficulty  [2]. 
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Much  of  the  existing  work  has  exploited  sparsity  to  achieve  good  performance  in  high¬ 
dimensional  settings.  Our  theoretical  results  prove  that  sparsity  is  not  only  “sufficient”  but 
also  necessary. 

These  findings  were  validated  through  various  simulations.  Additional  numerical  results  for 
support  vector  machines  and  sensitivity  to  mismatch  between  true  and  assumed  structure  were  also 
generated. 

Practical  algorithmic  framework  for  sensing-aware  inference:  We  formulated  sensing-aware 
inference  as  inference  based  on  optimally  utilizing  partial  knowledge  of  a  Markov  model  which 
relates  observed  data  to  the  decision  state  through  a  latent  unobserved  variable  [1],  This  has  con¬ 
tributed  to  the  development  of  new  sensing  aware  inference  tools  for  classical  problems.  In  par¬ 
ticular,  we  developed  a  new  kernel  learning  approach  to  supervised  classification.  We  developed 
a  general  framework  for  optimum  kernel  design  based  on  exploiting  knowledge  of  the  sensing 
structure.  We  used  our  algorithmic  framework  to  develop  practical  algorithms  for  optimal  sensing- 
aware  classification.  We  applied  the  methods  to  document  and  image  classification  tasks. 

We  uncovered  the  structure  of  the  Bayes-optimal  sensing-aware  binary  classifier.  We  showed 
that  the  Bayes-optimal  classifier  with  partial  knowledge  of  the  Markov  structure  is  a  linear  (hy¬ 
perplane)  classifier  in  a  functional  space  defined  by  the  partial  knowledge.  We  connected  sensing- 
aware  supervised  classification  to  the  vast  literature  devoted  to  kernel-methods  for  supervised  clas¬ 
sification.  We  showed  that  the  maximum-margin  hyperplane  classifier  in  our  new  functional  rep¬ 
resentation  is  equivalent  to  a  kemel-SVM  where  the  kernel  is  determined  by  the  partial  knowledge 
of  the  Markov  observation  model.  This  result  has  two  significant  consequences: 

1.  It  immediately  leads  to  practical  algorithms  for  sensing-aware  supervised  classification  since 
a  kemel-SVM  can  be  efficiently  solved  via  a  quadratic  program. 

2.  It  provides  a  principled  approach  to  kernel-design  for  kernel  SVMs  by  leveraging  knowl¬ 
edge  of  the  sensing  model  in  an  optimal  way.  Unlike  our  optimal  sensing-aware  kernel,  the 
myriad  kernels  that  have  been  studied  and  used  in  the  SVM-literature  are  not  designed  to 
directly  minimize  the  classification  error.  Moreover,  those  kernels  that  have  been  derived 
from  generative  models  require  full  model  knowledge  which  is  unreasonable  for  large  and 
complex  datasets  like  text  or  images. 

We  showed  that  the  popular  bag-of-words  model  for  text  and  images  can  be  reformulated  as 
a  special  type  of  sensing-aware  model  for  inference.  We  also  derived  the  optimal  sensing-aware 
kernel  for  this  model  in  closed  form  and  developed  several  practical  alternatives  to  the  closed-form 
expression.  In  classification  tasks  on  real-world  document  and  image  datasets,  the  bag-of-words 
sensing-aware  kernel  SVM  noticeably  improves  over  both  standard  and  domain-specific  hand¬ 
crafted  kernels.  It  even  matches  the  performance  of  rather  sophisticated  state-of-the-art  approaches 
such  as  those  based  on  deep  learning. 

We  also  developed  an  algorithmic  framework  for  designing  sensing-aware  structured  random 
projections  for  dimensionality  reduction  for  fast  nearest-neighbor  classification  [4].  The  role  of 
sensing  structure  in  a  related  problem  of  explosives  detection  using  multi-energy  x-ray  computed 
tomography  was  also  explored  [8,9]. 
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2.2  Discovering  latent  patterns  in  high-dimensional  data 

2.2.1  Overview 

We  studied  the  problem  of  modeling  and  discovering  salient  latent  topics  or  patterns  in  soft  and 
hard  observations  with  provable  performance  guarantees.  We  adopted  the  non-negative  matrix  fac¬ 
torization  framework  in  which  “documents”  are  viewed  as  probabilistic  mixtures  of  “latent  topics” 
which  are  modeled  as  distributions  over  “words”.  This  is  the  classic  “bags  of  words”  paradigm 
of  probabilistic  latent  semantic  analysis  which  ignores  information  in  the  word-ordering  as  a  first 
order  of  approximation.  This  framework  can  also  be  applied  to  videos  and  images  with  words 
corresponding  to  photometric  and  spatio-temporal  feature-vectors.  With  this  representation,  the 
matrix  formed  by  the  column  vectors  of  word-distributions  from  each  document  is  to  be  factorized 
into  a  topic  distribution  matrix  and  a  mixing  weight  matrix. 

A  number  of  approaches  have  been  proposed  in  the  literature  for  non-negative  matrix  factor¬ 
ization.  Most  of  them  need  to  resort  to  some  type  of  approximation  to  the  solution  of  a  non-convex 
optimization  problem  (e.g.,  alternating  minimization)  or  resort  to  heuristics.  In  contrast  to  these 
approaches  we  have  developed  a  new  geometrically-motivated  framework  for  non-negative  matrix 
factorization  based  on  two  key  insights: 

1.  Separability  condition:  the  distinguishing  characteristic  of  a  topic  is  the  existence  of  certain 
novel-words  that  are  unique  to  that  topic,  i.e.,  they  either  do  not  occur  or  rarely  occur  in  other 
topics  (relative  to  their  occurrence  in  the  topic). 

2.  Identifi ability:  Distinct  sets  of  separable  topics  when  combined  with  distinct  patterns  of 
topic-mixing  weights  across  all  documents  should  not  generate  statistically  indistinguishable 
patterns  of  word  distributions  across  all  documents. 

Based  on  these  two  insights,  we  developed  an  algorithm  for  topic  modeling  and  discovery 
that  has  provable  sample-complexity  guarantees,  performance  that  is  competitive  with  the  current 
state-of-the  art,  and  is  free  of  heuristics  and  approximations  [17, 18]. 

Our  algorithm  leverages  the  extreme-point  geometry  of  cross-document  empirical  word-word 
co-occurrence  frequencies.  It  makes  use  of  data-dependent  and  random  projections  to  robustly 
identify  and  cluster  novel  words  (extreme  points)  and  associated  topics.  Our  key  insight  here  is 
that  the  maximum  and  minimum  values  of  cross-document  frequency  patterns  projected  along  any 
direction  are  associated  with  novel  words.  Our  sample  complexity  bounds  for  topic  recovery  are 
state-of-the-art.  The  computational  complexity  of  our  random  projection  scheme  scales  linearly 
with  the  number  of  documents  and  the  number  of  words  per  document.  In  several  experiments  on 
both  synthetic  and  realworld  datasets,  our  approach  appears  to  significantly  outperform  competing 
methods  that  have  provable  guarantees.  Furthermore,  our  approach  can  deal  with  degenerate  cases 
found  in  some  datasets  where  the  extreme  points  can  lie  on  a  manifold  of  a  dimension  that  is  lower 
than  the  number  of  topics. 

2.2.2  Details  of  key  contributions 

We  established  necessary  and  sufficient  conditions  for  asymptotically  consistent  detection  of  novel 
words  and  estimation  of  topics  in  separable  topic  models  [16].  We  proved  that  the  topic  separability 
condition  is  an  inevitable  consequence  of  high  dimensionality  (large  vocabulary  size  relative  to  the 
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number  of  topics)  [12].  We  developed  a  novel  distributed  algorithm  for  novel  word  detection  and 
topic  matrix  estimation  whose  statistical  complexity  is  of  the  same  order  as  that  of  the  current 
state-of-the-art  centralized  approaches  while  requiring  insignificant  communication  between  the 
distributed  document  collections  [14].  We  leveraged  our  insights  in  topic  models  to  develop  a  new 
approach  to  the  learning  of  item-preference  behavior  in  large  communities  using  results  of  pairwise 
item  comparisons  [10, 11, 13].  Finally,  we  studied  a  dynamic  (sequential)  version  of  novel  word 
discovery  within  a  hyperspectral  imaging  application  context  which  combines  some  elements  of 
controlled  sensing  (thrust  2)  with  topic  modeling  [15]. 

1 .  Necessary  and  Sufficient  Conditions  for  Novel  Word  Detection  and  Topic  Estimation  in 
Separable  Topic  Models:  We  demonstrated,  for  the  first  time,  that  the  affine-independence 
condition  on  the  topic-mixing  weights  is  a  fundamental,  algorithm-independent,  information- 
theoretic  necessary  condition  for  asymptotically  consistent  separable  topic  estimation.  We 
also  showed  that  the  affine -independence  condition  is  sufficient  for  asymptotically  consis¬ 
tent  topic  estimation  in  separable  topic  models.  We  also  showed  that  the  stronger  simplicial 
condition  is  sufficient  for  asymptotically  consistent  novel  word  detection  in  separable  topic 
models. 

These  conditions  and  other  stronger  ones  that  imply  them  have  played  a  central  role  in  the 
development  (over  the  last  6  years)  of  polynomial  time  algorithms  with  provable  asymp¬ 
totic  consistency  and  sample  complexity  guarantees  for  topic  estimation  in  separable  topic 
models.  Of  these  algorithms,  those  that  relied  solely  on  the  simplicial  condition  were  not 
impractical  while  the  practical  ones  need  stronger  conditions. 

2.  Inevitability  of  Separability:  Most  Large  Topic  Models  are  Approximately  Separable 

We  leveraged  separability  as  a  key  structural  condition  in  topic  models  to  develop  asymptot¬ 
ically  consistent  algorithms  with  polynomial  statistical  and  computational  efficiency  guaran¬ 
tees.  Empirical  estimates  of  topic  matrices  for  Latent  Dirichlet  Allocation  models  are  known 
to  be  approximately  separable.  Separability  may  be  a  convenient  structural  property,  but  it 
appears,  on  the  surface,  to  be  a  rather  restrictive  condition.  We  proved,  however,  that  sepa¬ 
rability  is  in  fact  an  inevitable  consequence  of  high  dimensionality.  In  particular,  we  showed 
that  when  the  columns  of  the  topic  matrix  are  independently  sampled  from  a  Dirichlet  distri¬ 
bution,  the  resulting  topic  matrix  will  be  approximately  separable  with  probability  tending 
to  one  as  the  vocabulary  size  scales  to  infinity  sufficiently  faster  than  the  number  of  topics. 
This  is  based  on  combining  concentration  of  measure  results  with  properties  of  the  Dirichlet 
distribution  and  union  bounding  arguments.  Our  proof  techniques  can  be  extended  to  other 
priors  for  general  nonnegative  matrices. 

3.  Efficient  Distributed  Topic  Modeling  with  Provable  Guarantees:  Topic  modeling  for 
large-scale  distributed  web-collections  requires  distributed  techniques  that  account  for  both 
computational  and  communication  costs.  In  this  work  we  considered  topic  modeling  un¬ 
der  the  separability  assumption  and  developed  novel  computationally  efficient  methods  that 
provably  achieve  the  statistical  performance  of  the  state-of-the-art  centralized  approaches 
while  requiring  insignificant  communication  between  the  distributed  document  collections. 
We  achieve  trade-offs  between  communication  and  computation  without  actually  transmit¬ 
ting  the  documents.  Our  scheme  is  based  on  exploiting  the  geometry  of  normalized  word- 
word  co-occurrence  matrix  and  viewing  each  row  of  this  matrix  as  a  vector  in  a  high¬ 
dimensional  space.  We  relate  the  solid  angle  subtended  by  extreme  points  of  the  convex 
hull  of  these  vectors  to  topic  identities  and  construct  distributed  schemes  to  identify  topics. 


7 


The  algorithm  is  based  on  random  projections  which  consistently  detects  all  novel  words  of 
all  topics  using  only  up  to  second-order  empirical  word  moments. 


4.  A  Topic  Modeling  Approach  to  Learning  Preference-Behavior  from  Pairwise  Compar¬ 
isons:  The  recent  explosion  of  web  analytics  tools  has  enabled  us  to  collect  an  immense 
amount  of  partial  preferences  for  large  sets  of  items  such  as  products  from  Amazon,  movies 
from  Netflix,  or  restaurants  from  Yelp,  from  a  large  and  diverse  population  of  users  through 
transactions,  clicks,  etc.  Modeling,  learning,  and  ultimately  predicting  the  preference  be¬ 
havior  of  users  from  pairwise  comparisons  has  been  extensively  studied  since  the  1927  work 
of  Thurstone.  Yet,  almost  all  models  to  date  have  been  founded  on  a  clustering-perspective 
in  which  users  are  grouped  by  their  preference  behavior.  We  took  a  fundamentally  different 
decomposition-perspective  and  proposed  a  new  class  of  generative  models  for  pairwise  com¬ 
parisons  in  which  user  preference  behavior  can  be  decomposed  into  contributions  from  mul¬ 
tiple  shared  latent  “causes”  (partial  orders)  that  are  prevalent  in  the  population.  We  showed 
how  the  estimation  of  shared  latent  partial  orders  in  the  new  generative  model  can  be  for¬ 
mally  reduced  to  the  estimation  of  topics  in  a  statistically  equivalent  topic  modeling  problem 
in  which  causes  correspond  to  topics  and  item-pairs  to  words.  We  showed  that  an  inevitable 
consequence  of  having  a  relatively  small  number  of  shared  latent  causes  in  a  world  of  large 
number  of  item-pairs  is  the  presence  of  “novel”  item-pairs  for  each  latent  cause.  We  then 
leveraged  recent  advances  in  the  topic  modeling  literature  and  developed  an  algorithm  based 
on  extreme-point  identification  of  convex  polytopes  to  leam  the  shared  latent  partial  orders. 
Our  algorithm  is  provably  consistent  and  comes  with  polynomial  sample  and  computational 
complexity  guarantees.  We  demonstrated  that  our  new  model  is  empirically  competitive  with 
the  current  state-of-the-art  approaches  in  predicting  preferences  on  semi-synthetic  and  real 
world  datasets. 

5.  Dynamic  Topic  Discovery  through  Sequential  Projections:  In  order  to  connect  topic  mod¬ 
eling  and  discovery  algorithms  of  this  thrust  (system  modeling  and  local  information  pro¬ 
cessing)  with  the  controlled  sensing  thrust,  we  focused  on  the  aerial  hyperspectral  imaging 
application  in  which  words  correspond  to  pixels,  topics  to  different  species,  and  documents 
to  wavelengths  or  frequencies.  The  universe  of  all  possible  species  was  modeled  as  a  dictio¬ 
nary  and  each  measurement  as  the  selection  of  one  frequency  band.  The  controlled  sensing 
question  of  how  to  select  the  next  frequency  band  so  as  to  optimize  the  information  gain 
based  on  previous  observations  and  knowledge  of  the  dictionary  was  explored. 

Specifically,  we  proposed  an  adaptive  strategy  for  controlling  the  sensing  order  in  order  to 
maximize  a  suitably  normalized  solid  angle  as  a  robustness  measure  of  the  problem  geom¬ 
etry.  This  is  based  on  efficiently  identifying  pure  pixels  that  are  unique  to  each  endmember 
and  exploiting  information  from  a  spectral  library  known  in  advance  though  sequential  ran¬ 
dom  projections.  Simulations  on  synthetic  datasets  demonstrated  the  merits  of  our  scheme 
in  reducing  the  observation  cost. 

2.3  Action  recognition  on  the  feature-covariance  manifold 

Algorithms  for  recognizing  human  actions  in  a  video  sequence  are  needed  for  automated  aerial 
surveillance  using  UAVs.  Developing  algorithms  that  are  not  only  accurate  but  also  efficient  is 
challenging  due  to  the  complexity  of  the  task  and  the  sheer  size  of  video. 

We  developed  a  general  framework  for  compactly  representing,  quickly  comparing,  and  accu¬ 
rately  recognizing  actions  using  empirical  covariance  matrices  of  activity  features  extracted  from 
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video  sequences  [20-22].  With  each  pixel,  we  associate  a  feature  vector  which  provides  a  localized 
description  of  the  action.  This  generates  a  spatio-temporally  dense  collection  of  action  feature  vec¬ 
tors.  The  empirical  covariance  matrix  of  this  feature  vector  collection  provides  a  low-dimensional 
representation  of  the  action.  For  action  recognition,  we  adapted  two  supervised  learning  methods 
namely  the  classical  nearest-neighbor  classifier  and  the  recently  developed  sparse  linear  approxi¬ 
mation  classifier  to  work  with  labeled  training  dictionaries  of  action  covariance  matrices.  Key  to 
this  adaptation  is  the  novel  idea  that  classification  algorithms  that  have  been  developed  for  vectors 
can  be  re-purposed  for  covariance  tensors  by  using  the  log-nonlinearity  to  map  the  convex  cone  of 
covariance  matrices  to  the  (tangent)  vector  space  of  symmetric  matrices. 

We  tested  the  approach  on  two  types  of  action  feature  vectors;  one  based  on  silhouette  tunnels 
of  moving  objects  and  the  other  on  optical  flow.  Action  feature  vectors  of  the  first  type  describe 
the  shape  of  the  silhouette  tunnel.  Action  feature  vectors  of  the  second  type  describe  various 
motion  characteristics  such  as  velocity,  gradient,  and  divergence.  We  demonstrated  state-of-the-art 
recognition  performance  for  both  types  of  action  feature  vectors  on  the  Weizmann,  KTH,  YouTube, 
and  the  low-resolution  ICPR-2010  challenge  data  sets  under  modest  CPU  requirements. 

We  also  demonstrated  how  our  approach  can  be  used  for  sequentially  detecting  changes  in 
actions  in  an  adaptive,  unsupervised  manner  so  as  to  parse  a  long  video  into  sub-videos,  each 
containing  only  a  single  action  class.  We  used  a  non-parametric  statistical  framework  to  learn 
the  distribution  of  the  nearest-neighbor  Riemannian  distances  between  feature  covariance  matrices 
of  video  segments.  Then,  we  used  a  binary  hypothesis  test  to  determine  if  new  video  segments 
include  action  changes.  In  synthetic  and  natural  videos,  our  algorithm  detects  roughly  98%  of 
action  boundaries  with  roughly  0.2%  false  alarm  rate. 

We  also  investigated  how  our  framework  can  be  adapted  to  recognize  human  interactions, 
which  is  typically  a  more  challenging  problem  due  to  occlusion  between  moving  individuals.  We 
developed  an  approach  based  on  dividing  human  interactions  into  separate  sequences,  each  con¬ 
taining  a  single  individual,  and  then  combining  the  estimated  action  likelihoods  for  each  individual 
sequence. 

The  excellent  performance  of  the  log-covariance-matrix  representation  combined  with  sparse- 
linear  approximation  classification  demonstrated  in  our  work  for  action  recognition  should  encour¬ 
age  the  use  of  this  framework  for  other  local  activity  detection,  localization,  and  categorization 
problems. 


3  Thrust  2:  Interaction  Between  Information  and  Control 

3.1  Sensor  Scheduling  for  Energy-Efficient  Target  Tracking  in  Sensor  Net¬ 
works 

In  this  part  of  the  project,  we  studied  the  problem  of  tracking  an  object  moving  randomly  through  a 
network  of  wireless  sensors.  Our  objective  was  to  devise  strategies  for  scheduling  the  sensors  to  op¬ 
timize  the  tradeoff  between  tracking  performance  and  energy  consumption.  We  cast  the  scheduling 
problem  as  a  partially  observable  Markov  decision  process  (POMDP),  where  the  control  actions 
correspond  to  the  set  of  sensors  to  activate  at  each  time  step.  Using  a  bottom-up  approach,  we 
considered  different  sensing,  motion  and  cost  models  with  increasing  levels  of  difficulty.  At  the 
first  level,  the  sensing  regions  of  the  different  sensors  do  not  overlap  and  the  target  is  only  observed 
within  the  sensing  range  of  an  active  sensor.  Then,  we  considered  sensors  with  overlapping  sensing 
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range  such  that  the  tracking  error,  and  hence  the  actions  of  the  different  sensors,  are  tightly  coupled. 
Finally,  we  considered  scenarios  wherein  the  target  locations  and  sensors?  observations  assume 
values  on  continuous  spaces.  Exact  solutions  are  generally  intractable  even  for  the  simplest  models 
due  to  the  dimensionality  of  the  information  and  action  spaces.  Hence,  we  devised  approximate 
solution  techniques,  and  in  some  cases  derive  lower  bounds  on  the  optimal  tradeoff  curves.  The 
generated  scheduling  policies,  albeit  suboptimal,  often  provide  close-to-optimal  energy-tracking 
tradeoffs. 

The  publications  that  resulted  from  this  work  are  [23-26]. 

3.2  Controlled  Sensing  for  Hypothesis  Testing 

We  considered  the  problem  of  multiple  hypothesis  testing  with  observation  control,  and  studied  the 
structure  of  the  optimal  controller  under  various  asymptotic  regimes.  First,  we  considered  a  setup 
with  a  fixed  sample  size,  in  which  the  asymptotic  quantity  of  interest  is  the  optimal  error  exponent 
under  one  hypothesis  subject  to  constraints  on  the  probabilities  of  error  under  the  alternative  hy¬ 
potheses.  For  the  binary  hypothesis  case,  we  were  able  to  show  that  the  optimal  error  exponent 
corresponds  to  the  maximum  Kulback-Feibler  (KF)  divergence  where  the  maximization  is  over  the 
choice  of  controls.  We  have  further  shown  that  a  pure  stationary  control,  i.e.,  one  which  is  fixed 
and  does  not  depend  on  specific  realizations  of  past  measurements  and  past  controls  (open-loop), 
is  asymptotically  optimal  even  among  the  class  of  causal  control  policies.  We  also  derived  lower 
and  upper  bounds  on  the  optimal  error  exponent  for  the  multiple  hypothesis  case. 

We  next  considered  a  sequential  setup  wherein  the  controller  can  also  decide  when  to  stop 
taking  observations.  In  this  case,  the  objective  is  to  minimize  the  expected  stopping  time  subject 
to  the  constraints  of  vanishing  error  probabilities  under  each  hypothesis.  We  proposed  a  sequential 
test  for  the  multiple  hypothesis  case  and  showed  that  it  is  asymptotically  first-order  optimal. 

The  publications  that  resulted  from  this  work  are  [27-31]. 

3.3  Efficient  Target  Tracking  using  Mobile  Sensors 

We  studied  a  mathematical  model  for  tracking  of  a  moving  target  by  multiple  mobile  sensors  in  the 
partially  observable  Markov  decision  process  (POMDP)  framework.  We  proposed  computationally 
efficient  policies  for  controlling  the  mobile  sensors,  and  provided  a  guarantee  on  their  performance 
relative  to  that  of  the  optimal  policy.  Simulation  results  showed  that  our  proposed  policies  did 
perform  close  to  the  optimal  policy  for  certain  small  spatially  stationary  models  in  which  a  mobile 
sensor  can  always  move  as  fast  as  the  target  [32]. 

3.4  Controlled  Sensing  for  Sequential  Multihypothesis  Testing  with  Con¬ 
trolled  Markovian  Observations  and  Non-Uniform  Control  Cost 

We  proposed  a  new  model  for  controlled  sensing  for  multihypothesis  testing  and  studied  it  in  the 
sequential  setting.  This  new  model,  termed  controlled  Markovian  observation  model,  exhibits 
a  more  complicated  memory  structure  in  the  controlled  observations  than  existing  models.  In 
addition,  instead  of  penalizing  just  the  delay  until  the  final  decision  time  as  standard  sequential 
hypothesis  testing  problems,  a  much  more  general  cost  structure  is  considered  which  entails  ac¬ 
cumulating  the  total  control  cost  with  respect  to  an  arbitrary  control  cost  function.  We  proposed 
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an  asymptotically  optimal  test  for  this  new  model  and  showed  that  it  satisfies  a  strong  asymptotic 
optimality  condition  formulated  in  terms  of  decision  making  risk.  We  also  showed  that  the  optimal 
causal  control  policy  for  the  controlled  sensing  problem  is  self-tuning,  in  the  sense  of  maximizing 
an  inherent  “inferential”  reward  simultaneously  under  every  hypothesis,  with  the  maximal  value 
being  the  best  possible  corresponding  to  the  case  where  true  hypothesis  is  known  at  the  outset.  We 
also  proposed  another  test  to  meet  distinctly  predefined  constraints  on  the  various  decision  risks 
non-asymptotically,  while  retaining  asymptotic  optimality. 

We  proved  our  results  using  a  combination  of  tools  and  principles  from  both  decision  theory 
and  stochastic  control.  Interestingly,  although  the  role  of  the  causal  control  policy  in  the  con¬ 
trolled  sensing  problem  is  merely  to  facilitate  the  eventual  testing  among  the  hypotheses  without 
any  explicit  reward  structure  to  gauge  how  well  the  different  control  policies  perform,  our  results 
show  that  there  is  an  inherent  inferential  reward  structure  maximized  by  the  control  policy  of  the 
asymptotically  optimal  test  for  the  controlled  sensing  problem. 

These  results  were  published  in  [33,34]. 

3.5  Controlled  Sensing  Approach  to  Graph  Classification 

We  posed  the  problem  of  classifying  graphs  with  respect  to  connectivity  via  partial  observations 
of  nodes  as  a  composite  hypothesis  testing  problem  with  controlled  sensing.  An  observation  at 
a  node  is  a  subset  of  edges  incident  to  the  node  on  the  complete  graph  drawn  according  to  a 
probability  model,  which  are  modeled  as  conditionally  independent  given  their  neighborhoods. 
Connectivity  is  measured  through  average  node  degree  and  is  classified  with  respect  to  a  threshold. 
We  derived  a  simple  approximation  of  the  controlled  sensing  test  and  simulated  it  on  Erdos-Renyi 
Model  A  graphs  to  characterize  error  probabilities  as  a  function  of  expected  stopping  times.  We 
showed  that  our  test  achieves  favorable  tradeoffs  between  the  classification  error  and  the  number 
of  measurements  and  further  outperforms  existing  approaches,  especially  at  low  target  error  rates. 
Furthermore,  the  proposed  test  achieves  asymptotically  optimal  error  performance,  as  the  error  rate 
goes  to  zero.  See  [35,36]  for  details. 

3.6  Universal  Outlier  Hypothesis  Testing 

Motivated  by  our  previous  research  on  the  search  problem,  we  studied  the  following  outlier  hy¬ 
pothesis  testing  problem  in  a  universal  setting.  Vector  observations  are  collected  each  with  M  >  3 
coordinates,  a  small  subset  of  which  are  outlier  coordinates.  When  a  coordinate  is  an  outlier,  the 
observations  in  that  coordinate  are  assumed  to  be  distributed  according  to  an  ?outlier?  distribution, 
distinct  from  the  ?typical?  distribution  governing  the  observations  in  all  the  other  coordinates. 
Nothing  is  known  about  the  outlier  and  typical  distributions  except  that  they  are  distinct  and  have 
full  supports.  The  goal  is  to  design  a  universal  test  to  best  discern  the  outlier  coordinate(s).  For 
models  with  exactly  one  outlier,  we  proposed  a  universal  test  based  on  the  principle  of  the  gen¬ 
eralized  likelihood  test  and  showed  that  it  is  universally  exponentially  consistent.  We  derived  a 
single-letter  characterization  of  the  error  exponent  achievable  by  the  test,  and  showed  that  the  test 
is  asymptotically  efficient  as  the  number  of  coordinates  approaches  infinity.  When  the  null  hy¬ 
pothesis  with  no  outlier  is  included,  we  showed  that  a  modification  of  this  test  achieves  the  same 
error  exponent  under  each  non-null  hypothesis,  and  also  consistency  under  the  null  hypothesis  uni¬ 
versally.  Then,  we  studied  models  with  more  than  one  outliers  in  the  following  settings.  For  the 
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setting  with  a  known  number  of  distinctly  distributed  outliers,  we  proposed  a  universally  exponen¬ 
tially  consistent  test,  and  characterized  its  achievable  error  exponent.  We  also  characterized  the 
limiting  error  exponent  achieved  by  the  test,  and  established  that  it  enjoys  universally  asymptoti¬ 
cally  exponential  consistency.  For  the  setting  with  an  unknown  number  of  identically  distributed 
outliers,  we  showed  that  a  different  test  achieves  a  positive  error  exponent  under  each  non-null 
hypothesis,  and  also  consistency  under  the  null  hypothesis  universally.  When  the  outliers  can  be 
distinctly  distributed  (with  their  total  number  being  unknown),  we  showed  that  a  universally  ex¬ 
ponentially  consistent  test  cannot  exist,  even  when  the  typical  distribution  is  known  and  the  null 
hypothesis  is  excluded. 

These  results  have  appeared  in  the  following  publications  [37-41]. 

3.7  Universal  Sequential  Outlier  Hypothesis  Testing. 

We  proposed  a  universal  test  based  on  the  principles  underlying  the  Multihypothesis  Sequential 
Probability  Ratio  Test  (MSPRT)  and  the  generalized  likelihood  (GL)  test.  When  only  the  typical 
distribution  is  known,  we  derived  a  lower  bound  for  the  error  exponent  achievable  by  our  proposed 
test.  This  lower  bound  shows  that  this  error  exponent  is  larger  than  the  optimal  error  exponent  in 
the  fixed  sample  size  setting  when  the  outlier  distribution  is  also  known.  We  then  considered  the 
completely  universal  setting  where  neither  the  typical  nor  the  outlier  distribution  is  known,  and 
established  the  universally  exponential  consistency  of  our  test  whenever  there  are  three  or  more 
hypotheses.  In  addition,  we  derived  a  lower  bound  for  the  achievable  error  exponent  applicable 
when  the  number  of  hypotheses  is  sufficiently  large.  We  also  showed  that  the  asymptote  of  this 
lower  bound  (in  the  number  of  hypotheses)  coincides  with  the  previous  lower  bound  when  the 
typical  distribution  is  known.  With  an  additional  null  hypothesis  with  no  outlier,  we  showed  that  a 
suitable  modification  to  our  proposed  test  is  universally  consistent  under  the  null  hypothesis  while 
achieving  universal  exponential  consistency  under  every  non-null  hypothesis  for  both  the  settings. 
We  have  also  extended  these  results  to  the  quickest  detection  setting.  See  [42-45]  for  details. 

3.8  Universal  Tests  for  Optimal  Search  and  Stop 

We  studied  the  problem  of  universal  search  and  stop  using  an  adaptive  search  policy.  When  the 
target  location  is  searched,  the  observation  is  assumed  to  be  distributed  according  to  the  target 
distribution,  otherwise  it  is  distributed  according  to  the  absence  distribution.  We  assume  that  only 
the  absence  distribution  is  known,  and  the  target  distribution  can  be  arbitrarily  distinct  from  the 
absence  distribution.  An  adaptive  search  policy  specifies  the  current  search  location  based  on  the 
past  observations  and  past  search  locations.  At  the  stopping  time,  the  target’s  location  is  determined 
or  it  is  decided  that  it  is  missing.  The  overall  goal  is  to  achieve  a  certain  level  of  accuracy  for  the 
final  decision  using  the  fewest  number  of  observations.  The  results  in  this  work  should  be  regarded 
as  a  contribution  to  the  long- studied  area  of  search  theory,  in  particular,  searching  for  a  stationary 
target  in  discrete  time  and  space  with  a  discrete  search  effort. 

Conceptually,  a  desirable  goal  of  the  search  at  each  location  should  be  to  determine  if  the 
target  is  there.  To  this  end,  a  universal  sequential  test  for  two  hypotheses  can  be  used  at  each 
location  to  collect  multiple  subsequent  observations  that  will  eventually  lead  to  a  binary  outcome 
that  the  target  is  there  or  not.  To  improve  reliability  for  this  binary  decision  at  a  particular  search 
location,  one  can  use  a  test  that  takes  more  observations  at  that  location.  If  we  insist  on  using 
the  mentioned  sequential  binary  test  at  each  location  as  an  “inner”  test,  then  it  is  convenient  to 
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select  the  current  search  location  based  on  the  past  binary  outcomes  of  the  subsequent  binary  tests 
(instead  of  all  the  past  observational  outcomes  of  all  the  searches,  generally  taken  multiple  times 
at  each  of  the  locations).  With  this  imposition,  the  search  and  stop  problem  can  be  conceptually 
reduced  to  the  problem  of  constructing  an  “outer”  test  for  the  sequential  design  of  such  inner 
experiments.  This  intuitive  decomposition  leads  to  our  proposed  universal  sequential  test  for  search 
and  stop.  We  showed  that  when  the  target  is  present,  the  proposed  universal  test  yields  a  vanishing 
error  probability,  and  achieves  the  optimal  reliability,  in  terms  of  a  suitable  exponent  for  the  error 
probability,  universally  for  every  target  distribution.  Consequently,  the  knowledge  of  the  target 
distribution  is  only  useful  for  improving  reliability  for  detecting  a  missing  target.  We  also  showed 
that  a  multiplicative  gain  for  the  search  reliability  equal  to  the  number  of  searched  locations  is 
achieved  by  allowing  adaptivity  in  the  search.  See  [46, 47]  for  details. 


4  Thrust  3:  Decentralized  Processing  and  Interactive  Fusion 

4.1  Interactive  Fusion 

Existing  literature  in  information  fusion  almost  exclusively  assumes  a  static  setting  in  information 
flow:  nodes  propagate  information  on  a  directed  graph  (often  in  the  form  of  a  parallel,  tandem,  or 
tree  network)  and  no  interaction  is  assumed  or  allowed  between  nodes.  We  have  instead  taken  a 
more  holistic  approach  on  information  fusion  where  node  interaction  is  allowed  in  that  communi¬ 
cations  may  occur  in  an  interactive  manner.  Note  this  differs  from  the  traditional  study  of  feedback 
in  tree  structure  information  fusion  as  we  do  not  limit  the  number  of  rounds  of  interaction  and  do 
not  restrict  it  to  only  between  fusion  center  and  peripheral  nodes. 

We  established  that  [48],  with  conditional  independent  observations,  while  interactive  fusion 
may  strictly  improve  detection  performance  in  the  finite  sample  regime,  it  has  no  improvement 
over  the  static  tandem  fusion  system  for  the  large  sample  regime.  The  optimum  error  exponent, 
namely  the  Kullback-Leibler  distance,  remains  the  same  for  both  system.  However,  with  condition¬ 
ally  dependent  observations,  strict  performance  improvement  in  both  finite-sample  and  asymptotic 
regimes  are  possible. 

The  study  of  interactive  fusion  is  based  on  a  simple  but  elegant  result  regarding  the  optimal 
decision  structure  for  general  inference  problems  with  convex  or  affine  objective  functions.  This 
simple  result  has  broader  applications  to  inference  problems  that  are  beyond  the  specific  problem 
of  interactive  fusion.  For  example,  one  can  establish  that  for  the  general  tandem  fusion  system, 
communication  direction  should  always  be  in  favor  of  the  sensor  with  high  SNR,  i.e.,  it  should 
serve  as  the  fusion  center  [49]. 

This  interactive  fusion  framework  can  be  applied  to  various  different  fusion  systems.  In  partic¬ 
ular,  we  have  studied  the  simple  scheme  of  sensor  overhearing  in  a  simple  parallel  fusion  system 
where  similar  results  have  been  established  that  contrast  the  system  performance  with  overhearing 
to  that  of  independent  processing  at  all  peripheral  nodes  [50]. 

4.2  Data  Reduction  with  Quantization  Constraint 

The  sufficiency  principle  acts  as  a  guiding  principle  for  data  reduction  in  statistical  inference.  A 
sufficient  statistic  is  a  function  of  the  data,  chosen  so  that  it  ‘should  summarize  the  whole  of  the 
relevant  information  supplied  by  the  sample.  In  decentralized  settings,  a  sufficient  statistic  defined 
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with  respect  to  local  data  is  referred  to  as  a  local  sufficient  statistic;  if  a  collection  of  local  statistics 
form  a  global  sufficient  statistic,  they  are  said  to  be  globally  sufficient.  While  sufficiency  based  data 
reduction  ensures  no  loss  of  inference  performance  using  the  reduced  data,  communicating  a  one¬ 
dimensional  real  data  may  still  be  infeasible  when  communication  is  subject  to  a  finite  capacity 
constraint.  A  question  then  arises  that  if  each  node  in  a  decentralized  inference  system  has  to 
summarize  its  data  using  a  finite  number  of  bits,  is  it  still  optimal  to  implement  data  reduction 
using  global  sufficient  statistics  prior  to  quantization ?  The  answer  is  unfortunately  no ,  and  a 
simple  example  is  given  in  [51]  that  shows  globally  sufficiency  does  not  guarantee  optimal  data 
reduction  in  the  presence  of  finite -bit  quantization  which  leads  inevitably  to  information  loss. 

On  the  other  hand,  it  was  also  established  in  [51]  that  with  conditionally  independent  ob¬ 
servations,  the  traditionally  definite  global  sufficient  statistic  is  still  optimal  in  maximizing  the 
information  at  terminal  node  (i.e.,  the  fusion  center).  With  the  class  of  conditionally  dependent  ob¬ 
servations,  there  also  exist  cases  where  quantizing  local  sufficient  statistics  is  structurally  optimal. 
Using  a  simple  two  node  system  as  an  illustration,  when  Xi  and  X2  are  conditionally  dependent 
and  6  is  the  underlying  parameter  of  inference  interest,  a  hidden  variable  W  can  be  introduced  to 
induce  the  following  Markov  chains  hold 


Xi  -  W  -  X2, 

9-  W-(X!,X2). 


Within  this  hierarchical  conditional  independence  model,  first  introduced  in  [52]„  if  T ,  (X , )  and 
T2(X2)  are  local  statistics  that  are  sufficient  with  respect  to  W,  quantizing  7)  (X, )  and  T2(X2) 
at  the  respective  sensor  is  structurally  optimal  for  the  decentralized  inference  problem.  This  new 
framework  of  decentralized  data  reduction  with  quantization  constraints  has  broad  applications  to 
numerous  inference  problems  involving  networks  of  sensors  and  warrants  further  studies  under 
more  general  network  settings. 

4.3  Network  Consensus  and  Quantized  ADMM 

There  have  been  very  limited  algorithms  for  distributed  optimization  with  the  quantized  communi¬ 
cation  constraint.  Existing  quantized  algorithms  are  developed  based  on  the  subgradient  and  only 
guarantee  to  reach  a  neighborhood  of  the  optimal  value  at  a  sublinear  rate  with  the  error  increas¬ 
ing  in  the  size  of  the  network.  Recently  an  ADMM  based  quantized  algorithm,  referred  to  as  the 
quantized  consensus  ADMM,  (QC-ADMM),  has  been  proposed  in  [53].  It  primarily  solves  the 
distributed  optimization  problem  of  the  following  form 

N 

argrnin 

x  i= i 

where  ft  :  Mm  — »  M  is  the  local  objective  function,  using  only  local  computation  and  quantized 
communication. 

The  advantage  of  the  proposed  algorithm  is  that,  when  certain  convexity  assumptions  are  sat¬ 
isfied,  all  converge  to  the  same  quantization  point  within  logl+r/  O  iterations,  where  r/  >  0 
depends  on  the  local  objectives  and  the  network  topology,  and  is  a  polynomial  fraction  decided 
by  the  quantization  resolution,  the  distance  between  initial  and  optimal  variable  values,  the  local 
objective  functions  and  the  network  topology.  Furthermore,  the  consensus  error  does  not  depend 
on  the  size  of  the  network  and  is  usually  smaller  than  the  error  of  existing  quantized  algorithms. 
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While  the  above  algorithm  is  readily  applied  to  distributed  averaging  as  it  is  equivalent  to  a 
least-squares  minimization  problem,  we  notice  that  the  QC-ADMM  does  not  converge  uniquely. 
For  locally  convergent  algorithms,  it  is  well-known  that  a  good  starting  point  usually  helps.  Based 
on  this  fact,  [54]  proposed  a  two-stage  method  which  first  uses  the  ADMM  with  dithered  quan¬ 
tization  to  obtain  a  good  starting  point  and  then  employs  the  QC-ADMM  to  reach  a  consensus. 
Simulations  show  that  the  consensus  error  of  this  two-stage  approach  is  typically  less  than  one 
quantization  resolution  for  all  connected  networks  where  agents’  data  can  be  of  arbitrary  magni¬ 
tudes. 


4.4  Resource  Management  in  Sensor  Networks 

With  resource  constrained  sensor  networks,  sensor  management  and  resource  allocation  play  a 
crucial  role  in  maximizing  the  information  gathering  capability  with  limited  sensing  assets.  We 
have  studied  the  following  problems  along  the  line  of  sensor  management  for  situational  awareness. 


Sparsity-promoting  sensor  scheduling 

We  formulated  the  sensor  scheduling  problem  as  a  sparsity-aware  optimization  problem,  where 
the  goal  to  reduce  the  number  of  selected  sensors  is  characterized  by  a  sparsity-promoting  penalty 
term  in  the  objective  function  [55].  The  invented  sensor  scheduling  approach  has  been  successfully 
applied  in  field  estimation  and  target  tracking  [56,57].  Furthermore  in  [58]  and  [59],  to  account  for 
the  individual  power  constraint  of  each  sensor,  we  generalized  the  sparsity-promoting  optimiza¬ 
tion  framework  in  [55]  by  introducing  a  new  sparsity-promoting  penalty  function  which  avoids 
successive  selections  of  the  same  group  of  sensors. 


Optimal  sparse  sensor  collaboration 

The  problem  of  sensor  collaboration  arises  by  incorporating  the  process  of  inter-sensor  communi¬ 
cation  in  a  classical  distributed  estimation  network.  We  associated  the  cost  of  sensor  collaboration 
with  the  elementwise  sparsity  of  the  collaboration  matrix,  and  the  cost  of  sensor  selection  with  the 
rowwise  sparsity  of  the  collaboration  matrix.  Based  on  such  associations,  we  developed  a  unified 
optimization  framework  in  [60]  that  simultaneously  optimizes  the  collaboration  topology,  power 
allocation  and  sensor  selection  schemes.  We  showed  that  there  exists  an  optimal  sparse  collabora¬ 
tion  topology  given  limited  sensor  battery  power  [61,62],  and  a  trade-off  between  sensor  selection 
and  sensor  collaboration  [60] . 


Information-driven  sensor  selection 

We  derived  an  equivalent  Kalman  filter,  known  as  generalized  information  filter,  for  sensor  selec¬ 
tion  [63, 64].  We  showed  that  under  a  regularity  condition  the  design  of  non-myopic  (multi-time 
ahead)  sensor  selection  policy  is  equivalent  to  the  design  of  myopic  selection  policy  at  every  time 
step.  We  obtained  near-optimal  sensor  selection  schemes  by  solving  convex  programs  such  as  lin¬ 
ear  programs  or  semidefinite  programs.  We  showed  that  the  proposed  sensor  selection  approach 
scales  gracefully  with  network  size.  We  also  considered  the  problem  of  sensor  selection  with 
sensing  uncertainty  [65],  where  with  the  aid  of  mutual  information  and  Fisher  information,  we 
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developed  a  multiobjective  optimization  appraoch  to  strike  a  balance  between  the  estimation  accu¬ 
racy  and  energy  usage.  When  the  measurement  noise  is  correlated,  we  derived  the  closed  form  of 
the  Fisher  information  matrix  with  respect  to  sensor  selection  variables  [66,67].  We  theoretically 
showed  the  effect  of  noise  correlation  on  the  solutions  of  sensor  selection,  and  proposed  both  a 
convex  relaxation  approach  and  a  greedy  algorithm  to  find  these  solutions. 

Economic  equilibria  based  sensor  management 

We  considered  two  different  economic  models,  market  equilibrium  [68]  and  mechanism  design  for 
sensor  management  [69-71].  We  proposed  a  framework  for  the  mobile  sensor  scheduling  problem 
in  target  localization  by  designing  an  equilibrium-based  two-sided  market  model.  For  the  myopic 
target  tracking  problem  in  a  wireless  sensor  network  containing  sensors  that  are  selfish  and  profit- 
motivated,  we  proposed  a  crowdsourcing  based  framework  by  designing  an  incentive-compatible 
mechanism  for  the  bandwidth  allocation  problem. 

Compressive  sensing  based  probabilistic  sensor  management 

We  developed  a  probabilistic  sensor  management  scheme  based  on  the  concepts  developed  in  com¬ 
pressive  sensing  [72].  In  the  proposed  scheme  where  each  sensor  transmits  its  observation  with  a 
certain  probability  via  a  coherent  multiple  access  channel,  the  observation  vector  received  at  the 
fusion  center  becomes  a  compressed  version  of  the  original  observations.  In  this  framework,  the 
sensor  management  problem  can  be  cast  as  the  problem  of  finding  the  probability  of  transmission 
at  each  node  so  that  a  given  performance  metric  is  optimized. 

4.5  Assured  Information  Fusion 

As  with  other  technical  problems  for  situational  awareness,  information  assurance  plays  an  integral 
part  in  ensuring  the  integrity  of  information  gathering  and  processing.  Within  this  context,  we  have 
studied  the  following  set  of  problems. 

Detection  in  presence  of  Byzantines 

We  have  considered  the  problem  of  distributed  detection  in  tree  topologies  in  the  presence  of 
Byzantines  in  [73].  The  expression  for  minimum  attacking  power  required  by  the  Byzantines  to 
blind  the  fusion  center  (FC)  is  obtained.  More  specifically,  we  show  that  when  more  than  a  certain 
fraction  of  individual  node  decisions  are  falsified,  the  decision  fusion  scheme  becomes  completely 
incapable.  We  obtain  closed  form  expressions  for  the  optimal  attacking  strategies  that  minimize 
the  detection  error  exponent  at  the  FC.  We  also  look  at  the  possible  counter-measures  from  the  FCs 
perspective  to  protect  the  network  from  these  Byzantines.  We  formulate  the  robust  topology  design 
problem  as  a  bi-level  program  and  provide  an  efficient  algorithm  to  solve  it.  Similar  analysis  has 
been  carried  out  for  the  problem  of  distributed  Bayesian  detection  in  the  presence  of  Byzantines 
in  the  network  [74].  We  analyze  the  problem  under  different  attacking  scenarios  and  derive  results 
for  different  non-asymptotic  cases.  It  is  found  that  existing  asymptotics-based  results  do  not  hold 
under  several  non-asymptotic  scenarios.  We  next  model  the  strategic  behavior  of  the  FC  and  the 
attacker  using  game  theory  and  show  the  existence  of  Nash  Equilibrium  [75].  Also,  we  obtain  the 
optimal  attacking  strategy  from  the  point  of  view  of  a  smart  adversary  to  disguise  itself  from  the 
proposed  detection  scheme  while  accomplishing  its  attack  [76]. 
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Estimation  in  presence  of  Byzantines 

We  have  considered  the  problem  of  target  localization  [77]  and  tracking  [78]  in  Wireless  Sensor 
Networks  (WSNs)  in  the  presence  of  malicious  sensors.  We  analyzed  the  effect  of  false  information 
from  the  Byzantines  on  target  state  estimation.  We  analytically  obtained  the  minimum  fraction 
of  Byzantines  that  blinds  the  fusion  center,  i.e.,  that  makes  the  local  sensor  data  useless  to  the 
fusion  center.  We  also  proposed  a  dynamic  non-identical  quantizer  design  to  reduce  the  effect 
of  Byzantines  on  tracking  performance.  Moreover,  for  the  localization  problem  with  non-ideal 
channels,  we  have  proposed  the  use  of  soft-decision  decoding  to  compensate  for  the  loss  due  to  the 
presence  of  fading  channels  between  the  local  sensors  and  the  FC. 

4.6  Other  Related  Work  for  Decentralized  Inference  and  Information  Fu¬ 
sion 

Quantizer  Design  for  Distributed  Bayesian  Estimation 

We  considered  the  problem  of  quantizer  design  for  distributed  estimation  under  the  Bayesian  crite¬ 
rion  [79, 80].  We  showed  that  for  conditionally  unbiased  efficient  estimators,  when  all  the  sensors 
have  the  same  number  of  decision  regions,  identical  quantizers  are  optimal.  Considering  a  com¬ 
munication  rate  constraint  on  the  network,  we  derived  the  conditions  for  the  optimality  of  binary 
quantizers.  We  have  shown  that  when  the  observations  are  Gaussian,  identical  binary  quantizers 
are  optimal  in  the  low  SNR  regime.  For  the  location  parameter  estimation  problem  with  a  given 
prior  distribution,  we  have  found  the  optimal  binary  quantizer  by  solving  a  differential  equation. 
We  have  found  the  sufficient  condition  on  the  noise  distribution  for  which  the  threshold  quantizers 
attain  the  performance  limit.  By  relaxing  the  assumption  of  conditionally  independent  observa¬ 
tions  at  the  sensors,  we  also  derived  the  optimality  conditions  for  quantizers  with  conditionally 
dependent  observations. 


Reliable  Crowdsourcing  for  Multi-Class  Labeling  Using  Coding  Theory 

We  have  proposed  the  use  of  error-control  codes  and  decoding  algorithms  to  design  crowdsourc¬ 
ing  systems  for  reliable  classification  despite  unreliable  crowd  workers  [78].  Coding  theory  based 
techniques  also  allow  us  to  pose  easy-to-answer  binary  questions  to  the  crowd  workers.  We  con¬ 
sidered  three  different  crowdsourcing  models:  systems  with  independent  crowd  workers,  systems 
with  peer-dependent  reward  schemes,  and  systems  where  workers  have  common  sources  of  in¬ 
formation.  For  each  of  these  models,  we  analyzed  classification  performance  with  the  proposed 
coding-based  scheme.  We  have  developed  an  ordering  principle  for  the  quality  of  crowds  and  de¬ 
scribe  how  system  performance  changes  with  the  quality  of  the  crowd.  We  also  showed  that  pairing 
among  workers  and  diversification  of  the  questions  help  in  improving  system  performance. 
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